7 research outputs found
Neural machine translation of literary texts from English to Slovene
Neural Machine Translation has shown
promising performance in literary texts.
Since literary machine translation has not
yet been researched for the English-toSlovene translation direction, this paper
aims to fulfill this gap by presenting a
comparison among bespoke NMT models,
tailored to novels, and Google Neural Machine Translation. The translation models
were evaluated by the BLEU and METEOR metrics, assessment of fluency and
adequacy, and measurement of the postediting effort. The findings show that all
evaluated approaches resulted in an increase in translation productivity. The
translation model tailored to a specific author outperformed the model trained on a
more diverse literary corpus, based on all
metrics except the scores for fluency.
However, the translation model by Google
still outperforms all bespoke models. The
evaluation reveals a very low inter-rater
agreement on fluency and adequacy,
based on the kappa coefficient values, and
significant discrepancies between posteditors. This suggests that these methods
might not be reliable, which should be addressed in future studies.This publication has emanated from research
supported in part by a research grant from Science
Foundation Ireland (SFI) under Grant Number
SFI/12/RC/2289 (Insight), co-funded by the
European Regional Development Fund.peer-reviewe
Avtomatsko pridobivanje besednih zvez iz korpusa z uporabo leksikona SSJ
Računalniška leksikografija je meddisciplinarno področje, ki se osredotoča na avtomatizacijo leksikografskih postopkov in pripravo leksikalnih podatkovnih zbirk različnih vrst. V prispevku predstavljava postopek avtomatskega pridobivanja besednih zvez samostalnika z ujemalnim pridevniškim prilastkom iz besedilnega korpusa in avtomatsko pripravo izluščenih podatkov v ustrezni besednozvezni obliki z uporabo leksikona besednih oblik SSJ.The field of computational lexicography is an interdisciplinary field, primarily focusing on the automatisation of lexicographic procedures and the building of lexical databases of various kinds. In this paper we describe the automatic extraction of word phrases from a text corpus (phrases that contain adjectives that agree in gender, case, and number with the following noun) andthe transformation of extracted lexical data to a syntactically suitable final form by the means of the SSJ morphological lexicon
Post-edited and error annotated machine translation corpus PErr 1.0
The PE²rr corpus contains source language texts from different domains along with their automatically generated translations into several morphologically rich languages, their post-edited versions, and error annotations of the performed post-edit operations. The main advantage of the corpus is the fusion of post-editing and error classification tasks, which have usually been seen as two independent tasks, although naturally they are not
Back-translation approach for code-switching machine translation: A case study
Recently, machine translation has demonstrated significant
progress in terms of translation quality. However, most of the research
has focused on translating with pure monolingual texts in the source
and the target side of the parallel corpora, when in fact code-switching
is very common in communication nowadays. Despite the importance of
handling code-switching in the translation task, existing machine translation systems fail to accommodate the code-switching content. In this
paper, we examine the phenomenon of code-switching in machine translation for low-resource languages. Through different approaches, we evaluate the performance of our systems and make some observations about
the role of code-mixing in the available corpora.This publication has emanated from research supported in part by a research
grant from Science Foundation Ireland (SFI) under grant agreement number
SFI/12/RC/2289_P2, co-funded by the European Regional Development Fund,
and the Enterprise Ireland (EI) Innovation Partnership Programme under grant
number IP20180729, NURS – Neural Machine Translation for Under-Resourced
Scenarios.non-peer-reviewe
Neural machine translation of literary texts from English to Slovene
Neural Machine Translation has shown
promising performance in literary texts.
Since literary machine translation has not
yet been researched for the English-toSlovene translation direction, this paper
aims to fulfill this gap by presenting a
comparison among bespoke NMT models,
tailored to novels, and Google Neural Machine Translation. The translation models
were evaluated by the BLEU and METEOR metrics, assessment of fluency and
adequacy, and measurement of the postediting effort. The findings show that all
evaluated approaches resulted in an increase in translation productivity. The
translation model tailored to a specific author outperformed the model trained on a
more diverse literary corpus, based on all
metrics except the scores for fluency.
However, the translation model by Google
still outperforms all bespoke models. The
evaluation reveals a very low inter-rater
agreement on fluency and adequacy,
based on the kappa coefficient values, and
significant discrepancies between posteditors. This suggests that these methods
might not be reliable, which should be addressed in future studies.This publication has emanated from research
supported in part by a research grant from Science
Foundation Ireland (SFI) under Grant Number
SFI/12/RC/2289 (Insight), co-funded by the
European Regional Development Fund
Back-translation approach for code-switching machine translation: A case study
Recently, machine translation has demonstrated significant
progress in terms of translation quality. However, most of the research
has focused on translating with pure monolingual texts in the source
and the target side of the parallel corpora, when in fact code-switching
is very common in communication nowadays. Despite the importance of
handling code-switching in the translation task, existing machine translation systems fail to accommodate the code-switching content. In this
paper, we examine the phenomenon of code-switching in machine translation for low-resource languages. Through different approaches, we evaluate the performance of our systems and make some observations about
the role of code-mixing in the available corpora.This publication has emanated from research supported in part by a research
grant from Science Foundation Ireland (SFI) under grant agreement number
SFI/12/RC/2289_P2, co-funded by the European Regional Development Fund,
and the Enterprise Ireland (EI) Innovation Partnership Programme under grant
number IP20180729, NURS – Neural Machine Translation for Under-Resourced
Scenarios